LANDSLIDENET: ADAPTIVE VISION FOUNDATION MODEL FOR LANDSLIDE SEGMENTATION

IEEE International Geoscience and Remote Sensing Symposium (2024)

Talks

Deep leanring

Author

于峻川 (Junchuan Yu)

Published

July 16, 2024

Poster

Motivation

Segment Anything Model (SAM) is a vision foundation model trained on more than 10 million images. It possesses zero-shot semantic segmentation capabilities and demonstrates strong performance in both natural and medical image segmentation scenarios. This suggests that SAM has powerful feature extraction capabilities. However, our tests revealed that SAM struggles with recognizing landslides, especially when the target features are similar to the background, regardless of whether prompt information is provided. Typically, transfer learning is employed to optimize the model. A common approach is to augment the original dataset with landslide samples and retrain SAM. This method results in a model with a large number of parameters, which is often prohibitive for most researchers. Another approach is to train only the last layer of the model while keeping all other parameters frozen. This method has limited effectiveness in improving recognition accuracy since it does not alter the feature extraction capabilities of the model’s encoder and is more susceptible to overfitting.

Solution

The method we employ involves adding a trainable tuning layer to the encoder to enhance the model’s feature extraction capabilities for landslides. The results indicate that LandslideNet achieves better recognition accuracy compared to other well-known semantic segmentation models. Additionally, LandslideNet has the fewest trainable parameters among the other models, which means that this method can be used to transfer vision foundation models to specific application scenarios with just a single GPU card. Enabling everyone to train large models is the greatest contribution of this method.

Furthermore, we discovered that this novel transfer learning method is not only applicable to landslide recognition but also to the recognition of damaged buildings, InSAR surface deformation features, and more. Consequently, there are numerous potential avenues for exploration based on this approach, such as dynamic model optimization.